Close connection on transform error #707
Merged
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
Discovered while investigating #706 but quite likely unrelated.
I had previously expected that when a transform returns a
Result::Err
the connection would be immediately dropped.Upon writing an integration test to prove this, I discovered that it is in fact not the case.
Currently a transform returning
Result::Err
will result in no reply sent to the client and the connection remaining open, very likely resulting in the client waiting indefinitely for a response.So this PR changes shotover to drop the connection immediately when a transform returns a
Result::Err
.If a transform wishes to avoid dropping the connection when encountering an error they should handle the error internally.
There was also some changes needed to ensure the send/receive tasks are terminated when the main task is terminated
flushing
I hit an issue where the final messages sent were being lost.
The fix for that was to flush any pending messages before terminating the send task.
This was very tricky to track down because of the connection closing logic needed by cassandra version handling.
It looked like the problem was caused by a bug in that complicated logic, but it was actually that that logic was often exposing the bug due to its timing of message sending -> connection close.